Improving Software Performance in the Compute Unified Device Architecture
نویسنده
چکیده
This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture (CUDA). We address an issue of great importance when programming a CUDA application: the Graphics Processing Unit’s (GPU’s) memory management through transpose kernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU), like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge) tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques.
منابع مشابه
Tool Support for Software Performance Risk Assessment
Tool Support for Software Performance Risk Assessment Archana Radhakrishnan The Software Architecture Risk Assessment (SARA) tool is a utility to compute and analyze different architectural risk factors of software architecture modeled using Unified Modeling Language (UML). The different architectural risk factors are maintainability, requirements, reliability, and performance. The problem repo...
متن کاملImproving the performance of UPQC under unbalanced and distortional load conditions: A new control method
This paper presents a new control method for a three-phase four-wire Unified Power Quality Conditioner (UPQC) to deal with the problems of power quality under distortional and unbalanced load conditions. The proposed control approach is the combination of instantaneous power theory and Synchronous Reference Frame (SRF) theory which is optimized by using a self-tuning filter (STF) and without us...
متن کاملPerformance Analysis Of Mono-bit Digital Instantaneous Frequency Measurement (Difm) Device
Instantaneous Frequency Measurement (IFM) devices are the essential parts of anyESM, ELINT, and RWR receiver. Analog IFMs have been used for several decades. However, thesedevices are bulky, complex and expensive. Nowadays, there is a great interest in developing a wideband, high dynamic range, and accurate Digital IFMs. One Digital IFM that has suitably reached allthese requirements is mono-bi...
متن کاملParallel Prefix Scan with Compute Unified Device Architecture (cuda)
Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and procee...
متن کاملThe Optimization of Algorithms in the Process of Temporal Data Mining Using the Compute Unified Device Architecture
Considering the importance and usefulness of real time data mining, in recent years the concern of researchers to discover new hardware architectures that can manage and process large volumes of data has increased significantly. In this paper the performance of algorithms for temporal data mining that are implemented in the new Compute Unified Device Architecture (CUDA) from the latest generati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010